智能论文笔记

Traffic Congestion Prediction using Deep Convolutional Neural Networks: A Color-coding Approach

Mirza Fuad Adnan , Nadim Ahmed , Imrez Ishraque , Md. Sifath Al Amin , Md. Sumit Hasan

分类：计算机视觉 | 人工智能

2022-09-16

由于计算机视觉的最新进展，流量视频数据已成为限制交通拥堵状况的关键因素。这项工作为使用颜色编码方案提供了一种独特的技术，用于在深度卷积神经网络中训练流量数据之前。首先，将视频数据转换为图像数据集。然后，使用您只看一次算法进行车辆检测。已经采用了颜色编码的方案将图像数据集转换为二进制图像数据集。这些二进制图像被馈送到深度卷积神经网络中。使用UCSD数据集，我们获得了98.2％的分类精度。

translated by 谷歌翻译

A Comparison Study of Deep CNN Architecture in Detecting of Pneumonia

Al Mohidur Rahman Porag , Md. Mahedi Hasan , Dr. Md Taimur Ahad

分类：计算机视觉 | 机器学习

2022-12-30

Pneumonia, a respiratory infection brought on by bacteria or viruses, affects a large number of people, especially in developing and impoverished countries where high levels of pollution, unclean living conditions, and overcrowding are frequently observed, along with insufficient medical infrastructure. Pleural effusion, a condition in which fluids fill the lung and complicate breathing, is brought on by pneumonia. Early detection of pneumonia is essential for ensuring curative care and boosting survival rates. The approach most usually used to diagnose pneumonia is chest X-ray imaging. The purpose of this work is to develop a method for the automatic diagnosis of bacterial and viral pneumonia in digital x-ray pictures. This article first presents the authors' technique, and then gives a comprehensive report on recent developments in the field of reliable diagnosis of pneumonia. In this study, here tuned a state-of-the-art deep convolutional neural network to classify plant diseases based on images and tested its performance. Deep learning architecture is compared empirically. VGG19, ResNet with 152v2, Resnext101, Seresnet152, Mobilenettv2, and DenseNet with 201 layers are among the architectures tested. Experiment data consists of two groups, sick and healthy X-ray pictures. To take appropriate action against plant diseases as soon as possible, rapid disease identification models are preferred. DenseNet201 has shown no overfitting or performance degradation in our experiments, and its accuracy tends to increase as the number of epochs increases. Further, DenseNet201 achieves state-of-the-art performance with a significantly a smaller number of parameters and within a reasonable computing time. This architecture outperforms the competition in terms of testing accuracy, scoring 95%. Each architecture was trained using Keras, using Theano as the backend.

translated by 谷歌翻译

DRG-Net: Interactive Joint Learning of Multi-lesion Segmentation and Classification for Diabetic Retinopathy Grading

Hasan Md Tusfiqur , Duy M. H. Nguyen , Mai T. N. Truong , Triet A. Nguyen , Binh T. Nguyen , Michael Barz , Hans-Juergen Profitlich , Ngoc T. T. Than , Ngan Le , Pengtao Xie

分类：计算机视觉

2022-12-30

Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.

translated by 谷歌翻译

Efficient Movie Scene Detection using State-Space Transformers

Md Mohaiminul Islam , Mahmudul Hasan , Kishan Shamsundar Athrey , Tony Braskich , Gedas Bertasius

分类：计算机视觉

2022-12-29

The ability to distinguish between different movie scenes is critical for understanding the storyline of a movie. However, accurately detecting movie scenes is often challenging as it requires the ability to reason over very long movie segments. This is in contrast to most existing video recognition models, which are typically designed for short-range video analysis. This work proposes a State-Space Transformer model that can efficiently capture dependencies in long movie videos for accurate movie scene detection. Our model, dubbed TranS4mer, is built using a novel S4A building block, which combines the strengths of structured state-space sequence (S4) and self-attention (A) layers. Given a sequence of frames divided into movie shots (uninterrupted periods where the camera position does not change), the S4A block first applies self-attention to capture short-range intra-shot dependencies. Afterward, the state-space operation in the S4A block is used to aggregate long-range inter-shot cues. The final TranS4mer model, which can be trained end-to-end, is obtained by stacking the S4A blocks one after the other multiple times. Our proposed TranS4mer outperforms all prior methods in three movie scene detection datasets, including MovieNet, BBC, and OVSD, while also being $2\times$ faster and requiring $3\times$ less GPU memory than standard Transformer models. We will release our code and models.

translated by 谷歌翻译

Brain Cancer Segmentation Using YOLOv5 Deep Neural Network

Sudipto Paul , Dr. Md Taimur Ahad , Md. Mahedi Hasan

分类：计算机视觉

2022-12-27

An expansion of aberrant brain cells is referred to as a brain tumor. The brain's architecture is extremely intricate, with several regions controlling various nervous system processes. Any portion of the brain or skull can develop a brain tumor, including the brain's protective coating, the base of the skull, the brainstem, the sinuses, the nasal cavity, and many other places. Over the past ten years, numerous developments in the field of computer-aided brain tumor diagnosis have been made. Recently, instance segmentation has attracted a lot of interest in numerous computer vision applications. It seeks to assign various IDs to various scene objects, even if they are members of the same class. Typically, a two-stage pipeline is used to perform instance segmentation. This study shows brain cancer segmentation using YOLOv5. Yolo takes dataset as picture format and corresponding text file. You Only Look Once (YOLO) is a viral and widely used algorithm. YOLO is famous for its object recognition properties. You Only Look Once (YOLO) is a popular algorithm that has gone viral. YOLO is well known for its ability to identify objects. YOLO V2, V3, V4, and V5 are some of the YOLO latest versions that experts have published in recent years. Early brain tumor detection is one of the most important jobs that neurologists and radiologists have. However, it can be difficult and error-prone to manually identify and segment brain tumors from Magnetic Resonance Imaging (MRI) data. For making an early diagnosis of the condition, an automated brain tumor detection system is necessary. The model of the research paper has three classes. They are respectively Meningioma, Pituitary, Glioma. The results show that, our model achieves competitive accuracy, in terms of runtime usage of M2 10 core GPU.

translated by 谷歌翻译

A Dependable Hybrid Machine Learning Model for Network Intrusion Detection

Md. Alamin Talukder , Khondokar Fida Hasan , Md. Manowarul Islam , Md Ashraf Uddin , Arnisha Akhter , Mohammand Abu Yousuf , Fares Alharbi , Mohammad Ali Moni

分类：机器学习

2022-12-08

Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues.

translated by 谷歌翻译

Hybrid Parallel Imaging and Compressed Sensing MRI Reconstruction with GRAPPA Integrated Multi-loss Supervised GAN

Farhan Sadik , Md. Kamrul Hasan

分类：计算机视觉

2022-09-19

目的：并行成像通过用一系列接收器线圈获取其他灵敏度信息，从而加速了磁共振成像（MRI）数据，从而降低了相位编码步骤。压缩传感磁共振成像（CS-MRI）在医学成像领域中获得了普及，因为其数据要求较少，而不是平行成像。并行成像和压缩传感（CS）均通过最大程度地减少K空间中捕获的数据量来加快传统MRI获取。由于采集时间与样品的数量成反比，因此从缩短的K空间样品中的图像的反向形成会导致收购更快，但具有混乱的伪像。本文提出了一种新型的生成对抗网络（GAN），即雷德格尔（Recgan-gr）受到多模式损失的监督，以消除重建的图像。方法：与现有的GAN网络相反，我们提出的方法引入了一种新型的发电机网络，即与双域损耗函数集成的弹药网络，包括加权幅度和相位损耗函数以及基于平行成像的损失，即Grappa一致性损失。提出了K空间校正块，以使GAN网络自动化生成不必要的数据，从而使重建过程的收敛性更快。结果：全面的结果表明，拟议的Recgan-GR在基于GAN的方法中的PSNR有4 dB的改善，并且在文献中可用的传统最先进的CNN方法中有2 dB的改进。结论和意义：拟议的工作有助于显着改善低保留数据的图像质量，从而更快地获取了5倍或10倍。

translated by 谷歌翻译

BON: An extended public domain dataset for human activity recognition

Girmaw Abebe Tadesse , Oliver Bent , Komminist Weldemariam , Md. Abrar Istiak , Taufiq Hasan , Andrea Cavallaro

分类：计算机视觉

2022-09-12

人体戴的第一人称视觉（FPV）摄像头使从受试者的角度提取有关环境的丰富信息来源。然而，与其他活动环境（例如厨房和室外卧床）相比，基于可穿戴摄像头的eg中心办公室活动的研究进展速度很慢，这主要是由于缺乏足够的数据集来培训更复杂的（例如，深度学习）模型的模型在办公环境中的人类活动识别。本文提供了使用胸部安装的GoPro Hero摄像机，提供了三个地理位置的不同办公室设置中收集的大型公开办公活动数据集（BON）：巴塞罗那（西班牙），牛津（英国）和内罗毕（肯尼亚）。 BON数据集包含十八个常见的办公活动，可以将其分为人与人之间的互动（例如与同事聊天），人对象（例如，在白板上写作）和本体感受（例如，步行）。为5秒钟的视频段提供注释。通常，BON包含25个受试者和2639个分段。为了促进子域中的进一步研究，我们还提供了可以用作未来研究基准的结果。

translated by 谷歌翻译

Multiple Object Tracking in Recent Times: A Literature Review

Mk Bashar , Samia Islam , Kashifa Kawaakib Hussain , Md. Bakhtiar Hasan , A. B. M. Ashikur Rahman , Md. Hasanul Kabir

分类：计算机视觉

2022-09-11

近年来，多个对象跟踪引起了研究人员的极大兴趣，它已成为计算机视觉中的趋势问题之一，尤其是随着自动驾驶的最新发展。 MOT是针对不同问题的关键视觉任务之一，例如拥挤的场景中的闭塞，相似的外观，小物体检测难度，ID切换等，以应对这些挑战，因为研究人员试图利用变压器的注意力机制，与田径的相互关系，与田径的相互关系，图形卷积神经网络，与暹罗网络不同帧中对象的外观相似性，他们还尝试了基于IOU匹配的CNN网络，使用LSTM的运动预测。为了将这些零散的技术在雨伞下采用，我们研究了过去三年发表的一百多篇论文，并试图提取近代研究人员更关注的技术来解决MOT的问题。我们已经征集了许多应用，可能性以及MOT如何与现实生活有关。我们的评论试图展示研究人员使用过时的技术的不同观点，并为潜在的研究人员提供了一些未来的方向。此外，我们在这篇评论中包括了流行的基准数据集和指标。

translated by 谷歌翻译

Skin Lesion Analysis: A State-of-the-Art Survey, Systematic Review, and Future Trends

Md. Kamrul Hasan , Md. Asif Ahamad , Choon Hwai Yap , Guang Yang

分类：计算机视觉 | 机器学习

2022-08-25

用于皮肤病变分析的计算机辅助诊断系统（CAD）系统是一个新兴的研究领域，有可能减轻皮肤癌筛查的负担和成本。研究人员最近表示，对开发此类CAD系统的兴趣日益增加，目的是向皮肤科医生提供用户友好的工具，以减少手动检查提出的挑战。本文的目的是提供对2011年至2020年之间发表的尖端CAD技术的完整文献综述。使用系统评价和荟萃分析的首选报告项目（PRISMA）方法用于确定总共365个出版物，221用于皮肤病变细分，144用于皮肤病变分类。这些文章以多种不同的方式进行分析和汇总，以便我们可以贡献有关CAD系统发展方法的重要信息。这些方式包括：相关和基本的定义和理论，输入数据（数据集利用，预处理，增强和解决不平衡问题），方法配置（技术，体系结构，模块框架和损失），培训策略（超级表格设置）以及评估（评估）标准（指标）。我们还打算研究各种增强性能的方法，包括合奏和后处理。此外，在这项调查中，我们强调了使用最小数据集评估皮肤病变细分和分类系统的主要问题，以及对这些困境的潜在解决方案。总之，讨论了启发性发现，建议和趋势，目的是为了在关注的相关领域进行未来的研究监视。可以预见的是，它将在开发自动化和健壮的CAD系统进行皮肤病变分析的过程中指导从初学者到专家的各个级别的研究人员。

translated by 谷歌翻译

HTML版本